13 research outputs found

    Parallel Processes in HPX: Designing an Infrastructure for Adaptive Resource Management

    Get PDF
    Advancement in cutting edge technologies have enabled better energy efficiency as well as scaling computational power for the latest High Performance Computing(HPC) systems. However, complexity, due to hybrid architectures as well as emerging classes of applications, have shown poor computational scalability using conventional execution models. Thus alternative means of computation, that addresses the bottlenecks in computation, is warranted. More precisely, dynamic adaptive resource management feature, both from systems as well as application\u27s perspective, is essential for better computational scalability and efficiency. This research presents and expands the notion of Parallel Processes as a placeholder for procedure definitions, targeted at one or more synchronous domains, meta data for computation and resource management as well as infrastructure for dynamic policy deployment. In addition to this, the research presents additional guidelines for a framework for resource management in HPX runtime system. Further, this research also lists design principles for scalability of Active Global Address Space (AGAS), a necessary feature for Parallel Processes. Also, to verify the usefulness of Parallel Processes, a preliminary performance evaluation of different task scheduling policies is carried out using two different applications. The applications used are: Unbalanced Tree Search, a reference dynamic graph application, implemented by this research in HPX and MiniGhost, a reference stencil based application using bulk synchronous parallel model. The results show that different scheduling policies provide better performance for different classes of applications; and for the same application class, in certain instances, one policy fared better than the others, while vice versa in other instances, hence supporting the hypothesis of the need of dynamic adaptive resource management infrastructure, for deploying different policies and task granularities, for scalable distributed computing

    What does fault tolerant Deep Learning need from MPI?

    Full text link
    Deep Learning (DL) algorithms have become the de facto Machine Learning (ML) algorithm for large scale data analysis. DL algorithms are computationally expensive - even distributed DL implementations which use MPI require days of training (model learning) time on commonly studied datasets. Long running DL applications become susceptible to faults - requiring development of a fault tolerant system infrastructure, in addition to fault tolerant DL algorithms. This raises an important question: What is needed from MPI for de- signing fault tolerant DL implementations? In this paper, we address this problem for permanent faults. We motivate the need for a fault tolerant MPI specification by an in-depth consideration of recent innovations in DL algorithms and their properties, which drive the need for specific fault tolerance features. We present an in-depth discussion on the suitability of different parallelism types (model, data and hybrid); a need (or lack thereof) for check-pointing of any critical data structures; and most importantly, consideration for several fault tolerance proposals (user-level fault mitigation (ULFM), Reinit) in MPI and their applicability to fault tolerant DL implementations. We leverage a distributed memory implementation of Caffe, currently available under the Machine Learning Toolkit for Extreme Scale (MaTEx). We implement our approaches by ex- tending MaTEx-Caffe for using ULFM-based implementation. Our evaluation using the ImageNet dataset and AlexNet, and GoogLeNet neural network topologies demonstrates the effectiveness of the proposed fault tolerant DL implementation using OpenMPI based ULFM

    Automated Generation of Integrated Digital and Spiking Neuromorphic Machine Learning Accelerators

    Get PDF
    The growing numbers of application areas for artificial intelligence (AI) methods have led to an explosion in availability of domain-specific accelerators, which struggle to support every new machine learning (ML) algorithm advancement, clearly highlighting the need for a tool to quickly and automatically transition from algorithm definition to hardware implementation and explore the design space along a variety of SWaP (size, weight and Power) metrics. The software defined architectures (SODA) synthesizer implements a modular compiler-based infrastructure for the end-to-end generation of machine learning accelerators, from high-level frameworks to hardware description language. Neuromorphic computing, mimicking how the brain operates, promises to perform artificial intelligence tasks at efficiencies orders-of-magnitude higher than the current conventional tensor-processing based accelerators, as demonstrated by a variety of specialized designs leveraging Spiking Neural Networks (SNNs). Nevertheless, the mapping of an artificial neural network (ANN) to solutions supporting SNNs is still a non-trivial and very device-specific task, and completely lacks the possibility to design hybrid systems that integrate conventional and spiking neural models. In this paper, we discuss the design of such an integrated generator, leveraging the SODA Synthesizer framework and its modular structure. In particular, we present a new MLIR dialect in the SODA frontend that allows expressing spiking neural network concepts (e.g., spiking sequences, transformation, and manipulation) and we discuss how to enable the mapping of spiking neurons to the related specialized hardware (which could be generated through middle-end and backend layers of the SODA Synthesizer). We then discuss the opportunities for further integration offered by the hardware compilation infrastructure, providing a path towards the generation of complex hybrid artificial intelligence systems

    SO(DA)^2: End-to-end Generation of Specialized Reconfigurable Architectures (Invited Talk)

    No full text
    corecore